class: center, middle, inverse, title-slide # Lecture 4 ## Statistical Models and Notation ### Psych 10 C ### University of California, Irvine ### 04/06/2022 --- ## Objective in research - One of our main objectives in research is to contrast our believes about the world with the outcomes of experiments. -- - We do so by starting with some "verbal" statement or belief about the world which we then formalize using a statistical model. -- - Statistical models will allow us to make predictions about future observations. In the case of an experiment, they will allow us to make predictions about the outcomes. -- - The next step is to evaluate those predictions by comparing them with the outcomes (data) of the experiment. -- - Finally, we would like to go back and interpret the results of our evaluations with respect to our original believes or statements about the world. --- ## Statistical models - Statistical models are abstract representations of the world. -- - They are a way in which we can formalize our believes about probabilistic events. -- - For example, if we have an experiment where we throw a coin and have two competing ideas about the coin: -- - The coin is **fair**. -- - The coin is **not fair**. -- - We can formalize these two believes using a statistical model. -- - The coin is fair: `\(P(\{heads\})\ =\ P(\{tails\})\ =\ 0.5\)` -- - The coin is not fair: `\(P(\{heads\})\ \neq\ P(\{tails\})\)` -- - We moved from two verbal statements about our believes regarding the coin to two formal statements about the probability of "heads". --- ## Statistical Models - Statistical models are the formal representation of our believes or hypothesis about the outcomes of an experiment. -- - Given that we assume that the outcomes are probabilistic, our models will have a probabilistic component associated with them. -- - Given the nature of our observations it will be almost impossible for us to tell if a model is TRUE or FALSE. However, we can compare how useful they are on a given situation. -- - Statistical models will allow us to make predictions about our observations, which we will then use to compare how useful they are. -- - However, before we continue it will be useful to introduce some notation! -- - This will provide us with a way to express our models in a formal and standard way. --- class: inverse, middle, center # Notation --- ## Example: - To introduce notation we will start with a problem. -- - **Problem:** We want to know if people that smoke have lower lung capacity in comparison with people that do not smoke. -- - We have a variable that we are interested in, which is lung capacity as measured by some standard test. -- - We also have a variable that indicates if a given participant smokes or not. -- - We call the first one a **dependent** variable, because we want to see how it "depends" on the values of another. -- - We call the smoker indicator variable an **independent** variable. We are interested in how our independent variable affects the values of our dependent variable. -- - In other words, we want to know if lung capacity is a function of smoking status. --- ## Example: Smoking - We collect data from 8 participants, 4 smokers and 4 non smokers. --
-- - We will denote values of our dependent variables using `\(y\)` for example, the first observation of our first group (non-smokers) is denoted as `\(y_{11}\)` while the fourth observation of the same group is denoted `\(y_{41}\)` --- ## Example: Smoking - In general we say that the *i-th* observation of the *j-th* group is denoted as `\(y_{ij}\)`. Note that the letters `\(i\)` and `\(j\)` are a way to denote a general observation, if we want to look at a particular one we can write: -- - `\(y_{21}=\)` 77.6 - `\(y_{32}=\)` 73.6 -- - Now that we have a notation for our observations, we need a way to describe their variability. -- - Remember that our objective is to formalize our beliefs or hypothesis about the world. -- - We know that our observations are probabilistic so we need a way to describe their variability. -- - In order to do this we will use the normal distribution. --- class: inverse, middle, center # The Normal (Gaussian) Distribution --- ## The Normal distribution - The normal distribution is one of the most used statistical models in the literature. -- - One of its main advantages is that it can be described using two parameters, `\(\mu\)` and `\(\sigma^2\)`. -- - We denote the Normal distribution as `\(\text{Normal}(\mu,\sigma^2)\)`. --- ## Standard Normal distribution - `\(\text{Normal}(\mu = 0,\sigma^2 = 1)\)` <img src="data:image/png;base64,#lec-4_files/figure-html/norm-examp-1.png" style="display: block; margin: auto;" /> --- ## Normal distribution - The first parameter of the normal distribution `\(mu\)` represents the center of the distribution. Notice that this is the value that has the highest density. -- - The second parameter `\(\sigma^2\)` (or `\(\sigma\)`) controls the dispersion of the normal distribution: -- - For example, two normal distributions with the same variance (or value of `\(\sigma^2\)`) can be drawn in R using: .pull-left[ ```r ggplot(data = data.frame(support = c(-4, 6))) + aes(x = support) + scale_y_continuous(breaks = NULL) + ylab(label = "") + xlab("Support") + theme_classic() + theme(axis.title.x = element_text(size = 20)) + stat_function(fun = dnorm, args = list(mean = 0, sd = 1), col='red') + stat_function(fun = dnorm, args = list(mean = 2, sd = 1), col='blue') ``` ] .pull-right[ <img src="data:image/png;base64,#lec-4_files/figure-html/norm-ex-1-out-1.png" style="display: block; margin: auto;" /> ] --- ## Normal distribution - An example of two normal distributions with the same value of `\(\mu\)` but different `\(\sigma^2\)` would be: .pull-left[ ```r ggplot(data = data.frame(support = c(-9, 9))) + aes(x = support) + scale_y_continuous(breaks = NULL) + ylab(label = "") + xlab("Support") + theme_classic() + theme(axis.title.x = element_text(size = 20)) + stat_function(fun = dnorm, args = list(mean = 0, sd = 1), col='red') + stat_function(fun = dnorm, args = list(mean = 0, sd = 3), col='blue') ``` ] .pull-right[ <img src="data:image/png;base64,#lec-4_files/figure-html/norm-ex-2-out-1.png" style="display: block; margin: auto;" /> ] -- - Note that once we have assigned a value to our parameters `\(\mu\)` and `\(\sigma^2\)` or `\(\sigma\)` in ggplot, we have defined a Normal distribution completely. -- - In other words, we know the density assigned to each value of a random variable.